current query
Can Synthetic Query Rewrites Capture User Intent Better than Humans in Retrieval-Augmented Generation?
Zheng, JiaYing, Zhang, HaiNan, Pang, Liang, Tong, YongXin, Zheng, ZhiMing
Multi-turn RAG systems often face queries with colloquial omissions and ambiguous references, posing significant challenges for effective retrieval and generation. Traditional query rewriting relies on human annotators to clarify queries, but due to limitations in annotators' expressive ability and depth of understanding, manually rewritten queries often diverge from those needed in real-world RAG systems, resulting in a gap between user intent and system response. We observe that high-quality synthetic queries can better bridge this gap, achieving superior performance in both retrieval and generation compared to human rewrites. This raises an interesting question: Can rewriting models trained on synthetic queries better capture user intent than human annotators? In this paper, we propose SynRewrite, a synthetic data-driven query rewriting model to generate high-quality synthetic rewrites more aligned with user intent. To construct training data, we prompt GPT-4o with dialogue history, current queries, positive documents, and answers to synthesize high-quality rewrites. A Flan-T5 model is then finetuned on this dataset to map dialogue history and queries to synthetic rewrites. Finally, we further enhance the rewriter using the generator's feedback through the DPO algorithm to boost end-task performance. Experiments on TopiOCQA and QRECC datasets show that SynRewrite consistently outperforms human rewrites in both retrieval and generation tasks. Our results demonstrate that synthetic rewrites can serve as a scalable and effective alternative to human annotations.
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (5 more...)
Query-oriented Data Augmentation for Session Search
Chen, Haonan, Dou, Zhicheng, Zhu, Yutao, Wen, Ji-Rong
Modeling contextual information in a search session has drawn more and more attention when understanding complex user intents. Recent methods are all data-driven, i.e., they train different models on large-scale search log data to identify the relevance between search contexts and candidate documents. The common training paradigm is to pair the search context with different candidate documents and train the model to rank the clicked documents higher than the unclicked ones. However, this paradigm neglects the symmetric nature of the relevance between the session context and document, i.e., the clicked documents can also be paired with different search contexts when training. In this work, we propose query-oriented data augmentation to enrich search logs and empower the modeling. We generate supplemental training pairs by altering the most important part of a search context, i.e., the current query, and train our model to rank the generated sequence along with the original sequence. This approach enables models to learn that the relevance of a document may vary as the session context changes, leading to a better understanding of users' search patterns. We develop several strategies to alter the current query, resulting in new training data with varying degrees of difficulty. Through experimentation on two extensive public search logs, we have successfully demonstrated the effectiveness of our model.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.28)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Wisconsin > Racine County (0.04)
- (25 more...)
CORM: Cache Optimization with Recent Message for Large Language Model Inference
Dai, Jincheng, Huang, Zhuowei, Jiang, Haiyun, Chen, Chen, Cai, Deng, Bi, Wei, Shi, Shuming
Large Language Models (LLMs), despite their remarkable performance across a wide range of tasks, necessitate substantial GPU memory and consume significant computational resources. Beyond the memory taken up by model weights, the memory used by the KV cache rises linearly with sequence length, becoming a primary bottleneck for inference. In this paper, we introduce an innovative method for optimizing the KV cache, which considerably minimizes its memory footprint. Upon thorough investigation, we discover that in most Transformer models, (i) there is a striking similarity between adjacent tokens' query vectors, and (ii) the attention calculation of the current query can rely exclusively on the attention information of a small fraction of preceding queries. Based on these observations, we present CORM, a KV cache eviction policy that dynamically retains essential key-value pairs for inference without the need for model fine-tuning. Our validation shows that CORM reduces the inference memory usage of KV cache by up to 70\% with negligible performance degradation across six tasks in LongBench. Furthermore, we demonstrate that CORM is compatible with GQA for further compression rate.
From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models
Liu, Na, Chen, Liangyu, Tian, Xiaoyu, Zou, Wei, Chen, Kaijiang, Cui, Ming
RAISE, an enhancement exhibit high levels of performance in isolated of the ReAct framework, incorporates a tasks, creating an agent that can sustain coherent, dual-component memory system, mirroring context-aware, and purpose-driven conversations human short-term and long-term memory, remains an intricate endeavor. The need for to maintain context and continuity a more sophisticated framework that leverages the in conversations. It entails a comprehensive strengths of LLMs while addressing their limitations agent construction scenario, including in conversational settings has become increasingly phases like Conversation Selection, apparent. Scene Extraction, CoT Completion, and In response to this need, we introduce the Scene Augmentation, leading to the LLMs RAISE (Reasoning and Acting through Scratchpad Training phase. This approach appears to and Examples) architecture. RAISE represents enhance agent controllability and adaptability a refined enhancement of the existing Rein complex, multi-turn dialogues. Act(Yao et al., 2023) framework, specifically designed Our preliminary evaluations in a real estate to augment the capabilities of conversational sales context suggest that RAISE has agents. This paper presents a detailed exploration some advantages over traditional agents, of RAISE, highlighting its unique components indicating its potential for broader applications.
Learning to Relate to Previous Turns in Conversational Search
Mo, Fengran, Nie, Jian-Yun, Huang, Kaiyu, Mao, Kelong, Zhu, Yutao, Li, Peng, Liu, Yang
Conversational search allows a user to interact with a search system in multiple turns. A query is strongly dependent on the conversation context. An effective way to improve retrieval effectiveness is to expand the current query with historical queries. However, not all the previous queries are related to, and useful for expanding the current query. In this paper, we propose a new method to select relevant historical queries that are useful for the current query. To cope with the lack of labeled training data, we use a pseudo-labeling approach to annotate useful historical queries based on their impact on the retrieval results. The pseudo-labeled data are used to train a selection model. We further propose a multi-task learning framework to jointly train the selector and the retriever during fine-tuning, allowing us to mitigate the possible inconsistency between the pseudo labels and the changed retriever. Extensive experiments on four conversational search datasets demonstrate the effectiveness and broad applicability of our method compared with several strong baselines.
- North America > Canada > Quebec > Montreal (0.14)
- North America > United States > California > Los Angeles County > Long Beach (0.05)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Leisure & Entertainment (0.68)
- Media > Film (0.46)
Learning Implicit User Profiles for Personalized Retrieval-Based Chatbot
Qian, Hongjin, Dou, Zhicheng, Zhu, Yutao, Ma, Yueyuan, Wen, Ji-Rong
In this paper, we explore the problem of developing personalized chatbots. A personalized chatbot is designed as a digital chatting assistant for a user. The key characteristic of a personalized chatbot is that it should have a consistent personality with the corresponding user. It can talk the same way as the user when it is delegated to respond to others' messages. We present a retrieval-based personalized chatbot model, namely IMPChat, to learn an implicit user profile from the user's dialogue history. We argue that the implicit user profile is superior to the explicit user profile regarding accessibility and flexibility. IMPChat aims to learn an implicit user profile through modeling user's personalized language style and personalized preferences separately. To learn a user's personalized language style, we elaborately build language models from shallow to deep using the user's historical responses; To model a user's personalized preferences, we explore the conditional relations underneath each post-response pair of the user. The personalized preferences are dynamic and context-aware: we assign higher weights to those historical pairs that are topically related to the current query when aggregating the personalized preferences. We match each response candidate with the personalized language style and personalized preference, respectively, and fuse the two matching signals to determine the final ranking score. Comprehensive experiments on two large datasets show that our method outperforms all baseline models.
- North America > United States > New York > New York County > New York City (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (9 more...)
- Leisure & Entertainment > Sports > Tennis (0.68)
- Information Technology (0.67)